Genetics Algorithm Feature Selection for Improving Aqueous Solubility Prediction

نویسندگان

چکیده

Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models predicting aqueous reaction molecules. The open public dataset, AqSolDB, was used model development which contains 9982 data on molecule solubility. Several regression were trained dataset and their performance evaluated using mean absolute error. use model-based tree development. result showed that best prediction Categoric Boosting Regressor achieving 0.854 importance feature affected can also be calculated from calculation. It shown variable MolLogP strongly correlated with reaction. To further improve our model, selected features a genetics algorithm learning-based models. lowest error obtained 0.771 provides improvement previous calculation without selection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

improving short-term wind power prediction with neural network and ica algorithm and input feature selection

according to this fact that wind is now a part of global energy portfolio and due to unreliable and discontinuous production of wind energy; prediction of wind power value is proposed as a main necessity. in recent years, various methods have been proposed for wind power prediction. in this paper the prediction structure involves feature selection and use of artificial neural network (ann). in ...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Feature Selection in Data-Mining for Genetics Using Genetic Algorithm

We discovered genetic features and environmental factors which were involved in multifactorial diseases. To exploit the massive data obtained from the experiments conducted at the General Hospital, Chennai, data mining tools were required and we proposed a 2-Phase approach using a specific genetic algorithm. This heuristic approach had been chosen as the number of features to consider was large...

متن کامل

Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

Rosetta is one of the leading algorithms for protein structure prediction today. It is a Monte Carlo energy minimization method requiring many random restarts to find structures with low energy. In this paper we present a resampling technique for structure prediction of small alpha/beta proteins using Rosetta. From an initial round of Rosetta sampling, we learn properties of the energy landscap...

متن کامل

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of physics

سال: 2022

ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']

DOI: https://doi.org/10.1088/1742-6596/2377/1/012016